Tone modeling using Gaussian process latent variable model for statistical speech synthesis

نویسندگان

  • Decha Moungsri
  • Tomoki Koriyama
  • Takao Kobayashi
چکیده

In continuous speech of Thai language, tone pronunciation is affected by several factors. One of significant factors is stress that causes a diversity of F0 contours of tone, and affects syllable durations. Our previous studies have shown that a stressed/unstressed syllable context improves tone modeling accuracy. However, the stress in Thai language is generally unknown for a given input text and it has a wide variety of degrees of stress. Thus the simple stressed/unstressed context is insufficient to represent variation of stress. In this study, we introduce an unsupervised dimensional reduction technique, variational GP-LVM, to represent a diversity of stress. The stress-related information, F0 contour and duration, is projected onto a latent space which has lower dimensionality than the original to represent the variation of stress. Then, we use the latent variable as a stress-related context in GPR-based speech synthesis framework that enables us to determine the similarity of contextual factors continuously using a kernel function. We examine two approaches to data projection: single-space projection and separated-space projection. Objective and subjective evaluation results show that the proposed technique achieves an improvement in tone modeling.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech-Driven Facial Animation Using a Shared Gaussian Process Latent Variable Model

In this work, synthesis of facial animation is done by modelling the mapping between facial motion and speech using the shared Gaussian process latent variable model. Both data are processed separately and subsequently coupled together to yield a shared latent space. This method allows coarticulation to be modelled by having a dynamical model on the latent space. Synthesis of novel animation is...

متن کامل

Unsupervised Stress Information Labeling Using Gaussian Process Latent Variable Model for Statistical Speech Synthesis

In Thai language, stress is an important prosodic feature that not only affects naturalness but also has a crucial role in meaning of phrase-level utterance. It is seen that a speech synthesis model that is trained with lack of stress and phrase-level information causes incorrect tones and ambiguity in meaning of synthetic speech. Our previous work has shown that manually annotated stress infor...

متن کامل

Dynamic texture modeling and synthesis using multi-kernel Gaussian process dynamic model

Dynamic texture (DT) widely exists in various social video media. Therefore, DT modeling and synthesis plays an important role in social media analyzing and processing. In this paper, we propose a Bayesian-based nonlinear dynamic texture modeling method for dynamic texture synthesis. To capture the non-stationary distribution of DT, we utilize the Gaussian process latent variable model for dime...

متن کامل

Phase-incorporating Speech Enhancement Based on Complex-valued Gaussian Process Latent Variable Model

Traditional speech enhancement techniques modify the magnitude of a speech in time-frequency domain, and use the phase of a noisy speech to resynthesize a time domain speech. This work proposes a complex-valued Gaussian process latent variable model (CGPLVM) to enhance directly the complexvalued noisy spectrum, modifying not only the magnitude but also the phase. The main idea that underlies th...

متن کامل

Duration prediction using multi-level model for GPR-based speech synthesis

This paper introduces frame-based Gaussian process regression (GPR) into phone/syllable duration modeling for Thai speech synthesis. The GPR model is designed for predicting framelevel acoustic features using corresponding frame information, which includes relative position in each unit of utterance structure and linguistic information such as tone type and part of speech. Although the GPR-base...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016